192 ◾ Bioinformatics
Figure 5.17 shows the boxplot of the TMM-normalized data for each sample. The black line
dividing each box represents the median of the count data, the top of the box shows the
upper quartile, and the bottom of the box shows the lower quartile. The top and bottom
whiskers show the highest and lowest count values, respectively, and the circles indicate
the outliers.
We can use multidimensional scaling (MDS) [35] for representing relationships graphi-
cally between samples in multidimensional space and showing the overall differences
between the gene expression profiles for the different samples. The MDS uses the pairwise
dissimilarity Euclidean distances between samples in terms of the leading log-fold change
(logFC) for the genes that best characterize the pair of samples. The leading logFC is cal-
culated as the root-mean-square of the top log2-fold changes between the pair of samples
(the default is 500 (top=500) logFCs). Edger uses “plotMDS” function to plot the MDS plot.
The general syntax is as follows:
plotMDS(x, top=500, gene.selection = “pairwise”, method = “logFC”,
...)
The samples are then represented graphically in two dimensions such that the distance
between points on the plot approximates their multivariate dissimilarity. The objects that
are closer together on the MDS plot are more similar than the distant ones. In our example
data, if there is a difference between the normal and tumor samples, then we can see clear
patterns in multivariate dataset.
install.packages(“RColorBrewer”)
library(RColorBrewer)
png(file=”MDSPlot.png”)
pseudoCounts <- log2(yNorm$counts + 1)
colConditions <- brewer.pal(3, “Set2”)
colConditions <- colConditions[match(sampleinfo$condition,
levels(factor(sampleinfo$condition)))]
patients <- c(8, 15, 16)[match(sampleinfo$patient,
levels(factor(sampleinfo$patient)))]
plotMDS(pseudoCounts, pch = patients, col = colConditions, xlim =
c(-2,2))
legend(“topright”, lwd = 2, col = brewer.pal(3, “Set2”)[1:2],
legend = levels(factor(sampleinfo$condition)))
legend(“bottomright”, pch = c(8, 15, 16),
legend = levels(factor(sampleinfo$patient)))
dev.off()
As shown in Figure 5.18, the patterns are very clear that the samples are clustered by the
condition; normal samples are grouped together, and tumor samples are grouped together,
which indicates that the difference between the two groups is much larger than the dif-
ferences within groups and it is likely that the between-group difference is statistically